Logistic Regression Tree Analysis
نویسنده
چکیده
This chapter describes a tree-structured extension and generalization of the logistic regression method for fitting models to a binary-valued response variable. The technique overcomes a significant disadvantage of logistic regression, which is interpretability of the model in the face of multicollinearity and Simpson’s paradox. Section 1 summarizes the statistical theory underlying the logistic regression model and the estimation of its parameters. Section 2 reviews two standard approaches to model selection for logistic regression, namely, model deviance relative to its degrees of freedom and the AIC criterion. A dataset on tree damage during a severe thunderstorm is used to compare the approaches and to highlight their weaknesses. A recently published partial one-dimensional model that addresses some of the weaknesses is also reviewed. Section 3 introduces the idea of a logistic regression tree model. The latter consists of a binary tree in which a simple linear logistic regression (i.e., a linear logistic regression using a single predictor variable) is fitted to each leaf node. A split at an intermediate node is characterized by a subset of values taken by a (possibly different) predictor variable. The objective is to partition the dataset into rectangular pieces according to the values of the predictor variables such that a simple linear logistic regression model adequately fits the data in each piece. Because the tree structure and the piecewise models can be presented graphically, the whole model can be easily understood. This is illustrated with the thunderstorm dataset using the LOTUS algorithm. Section 4 describes the basic elements of the LOTUS algorithm, which is based on recursive partitioning and cost-complexity pruning. A key feature of the algorithm is a correction for bias in variable selection at the splits of the tree. Without bias correction, the splits can yield incorrect inferences. Section 5 shows an application of LOTUS to a dataset on automobile crash-tests involving dummies. This dataset is challenging because of its large size, its mix of ordered and unordered variables, and its large number missing values. It also provides a demonstration of Simpson’s paradox. The chapter concludes with some remarks in Section 6.
منابع مشابه
Comparing the Results of Logistic Regression Model and Classification and Regression Tree Analysis in Determining Prognostic Factors for Coronary Artery Disease in Mashhad, Iran
Background and purpose: Understanding of the risk factors for cardiovascular artery disease, which is the leading cause of death worldwide, can lead to essential changes in its etiology, prevalence, and treatment. The aim of this study was to compare the results of logistic regression model and Classification and Regression Tree Analysis (CART) in determining the prognostic factors for coronary...
متن کاملComparison of Gestational Diabetes Prediction Between Logistic Regression, Discriminant Analysis, Decision Tree and Artificial Neural Network Models
Background and Objectives: Gestational Diabetes Mellitus (GDM) is the most common metabolic disorder in pregnancy. In case of early detection, some of its complications can be prevented. The aim of this study was to investigate early prediction of GDM by logistic regression (LR), discriminant analysis (DA), decision tree (DT) and perceptron artificial neural network (ANN) and to compare these m...
متن کاملRanking stocks of listed companies on Tehran stock exchange using a hybrid model of decision tree and logistic regression
Much research has introduced linear or nonlinear models using statistical models and machine learning tools in artificial intelligence to estimate Iran's rate of return. The primary purpose of these methods is simultaneously use different independent variables to improve stock return rates' modeling. However, in predicting the rate of return, in addition to the modeling method, the degree of co...
متن کاملمقایسه مدل درخت تصمیم و رگرسیون لوجستیک در ارزیابی پوکی استخوان
Introduction: Early detection of osteoporosis is a key to preventing of it; but recognition, without the use of appropriate diagnostic methods, due to the complexity of risk factors and gradual bone loss process, is problem. The purpose of this study is to develop and efficiency evaluation a predictive model of osteoporosis using decision tree technique as a diagnostic method based on available...
متن کاملTree Induction vs. Logistic Regression: A Learning-Curve Analysis
Tree induction and logistic regression are two standard o the shelf methods for building models for classi cation We present a large scale experimental comparison of logistic regression and tree induction assessing classi cation ac curacy and the quality of rankings based on class membership probabilities We use a learning curve analysis to examine the relationship of these measures to the size...
متن کاملA Comparison of Decision Tree with Logistic Regression Model for Prediction of Worst Non-Financial Payment Status in Commercial Credit
Credit risk prediction is an important problem in the financial services domain. While machine learning techniques such as Support Vector Machines and Neural Networks have been used for improved predictive modeling, the outcomes of such models are not readily explainable and, therefore, difficult to apply within financial regulations. In contrast, Decision Trees are easy to explain, and provide...
متن کامل